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Abstract 

The emerging Web of Data utilizes the web infrastructure to represent and interre- 
late data. The foundational standards of the Web of Data include the Uniform Resource 
Identifier (URI) and the Resource Description Framework (RDF). URIs are used to 
identify resomces and RDF is used to relate resources. While RDF has been posited 
as a logic language designed specifically for knowledge representation and reasoning, 
it is more generally useful if it can conveniently support other models of computing. In 
order to realize the Web of Data as a general-purpose medium for storing and process- 
ing the world's data, it is necessary to separate RDF from its logic language legacy and 
frame it simply as a data model. Moreover, there is significant advantage in seeing the 
Semantic Web as a particular interpretation of the Web of Data that is focused specif- 
ically on knowledge representation and reasoning. By doing so, other interpretations 
of the Web of Data are exposed that realize RDF in different capacities and in support 
of different computing models. 
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1 Introduction 

The common conception of the World Wide Web is that of a large-scale, distributed file 
repository [6]. The typical files found on the World Wide Web are Hyper-Text Markup 
Language (HTML) documents and other media such as image, video, and audio files. The 
"World Wide" aspect of the World Wide Web pertains to the fact that all of these files 
have an accessible location that is denoted by a Uniform Resource Locator (URL) [55]; a 
URL denotes what physical machine is hosting the file (i.e. what domain name/IP address), 
where in that physical machine the file is located (i.e. what directory), and finally, which 
protocol to use to retrieve that file from that machine (e.g. http, ftp, etc.). The "Web" 
aspect of the World Wide Web pertains to the fact that a file (typically an HTML docu- 
ment) can make reference (typically an href citation) to another file. In this way, a file on 
machine A can link to a file on machine B and in doing so, a network/graph/web of files 
emerges. The ingenuity of the World Wide Web is that it combines remote file access proto- 
cols and hypermedia and as such, has fostered a revolution in the way in which information 
is disseminated and retrieved — in an open, distributed manner. From this relatively simple 
foundation, a rich variety of uses emerges: from the homepage, to the blog, to the online 
store. 

The World Wide Web is primarily for human consumption. While HTML documents 
are structured according to a machine understandable syntax, the content of the documents 
are written in human readable/writable language (i.e. natural human language). It is only 
through computationally expensive and relatively inaccurate text analysis algorithms that a 
machine can determine the meaning of such documents. For this reason, computationally 
inexpensive keyword extraction and keyword-based search engines are the most prevalent 
means by which the World Wide Web is machine processed. However, the human-readable 
World Wide Web is evolving to support a machine-readable Web of Data. The emerging 
Web of Data utilizes the same referencing paradigm as the World Wide Web, but instead of 
being focused primarily on URLs and files, it is focused on Uniform Resource Identifiers 
(URI) [7] and dataQ The "Data" aspect of the Web of Data pertains to the fact that a URI 
can denote anything that can be assigned an identifier: a physical entity, a virtual entity, 
an abstract concept, etc. The "Web" aspect of the Web of Data pertains to the fact that 
identified resource can be related to other resources by means of the Resource Description 
Framework (RDF). Among other things, RDF is an abstract data model that specifies the 
syntactic rules by which resources are connected. If U is the set of all URIs, B the set of all 
blank or anonymous nodes, and L the set of all literals, then the Web of Data is defined as 

W C {{U U B) X U X {U U B U L)). 

A single statement (or triple) in W is denoted {s,p,o), where s is called the subject, p the 
predicate, and o the object. On the Web of Data 

"[any man or machine can] start with one data source and then move through 
a potentially endless Web of data sources connected by RDF links. Just as 
the traditional document Web can be crawled by following hypertext links, the 
Web of Data can be crawled by following RDF links. Working on the crawled 

'The URI is the parent class of both the URL and the Uniform Resource Name (URN) [55]. 
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data, search engines can provide sophisticated query capabilities, similar to 
those provided by conventional relational databases. Because the query results 
themselves are structured data, not just links to HTML pages, they can be im- 
mediately processed, thus enabling a new class of applications based on the 
Web of Data." [9] 

As a data model, RDF can conveniently represent commonly used data structures. From 
the knowledge representation and reasoning perspective, RDF provides the means to make 
assertions about the world and infer new statements given existing statements. From the 
network/graph analysis perspective, RDF supports the representation of various network 
data structures. From the programming and systems engineering perspective, RDF can be 
used to encode objects, instructions, stacks, etc. The Web of Data, with its general-purpose 
data model and supporting technological infrastructure, provides various computing models 
a shared, global, distributed space. Unfortunately, this general-purpose, multi-model vision 
was not the original intention of the designers of RDF. RDF was created for the domain of 
knowledge representation and reasoning. Moreover, it caters to a particular monotonic sub- 
set of this domain [29] . RDF is not generally understood as supporting different computing 
models. However, if the Web of Data is to be used as just that, a "web of data", then it is up 
to the applications leveraging this data to interpret what that data means and what it can be 
used for. 

The URI address space is an address space. It is analogous, in many ways, to the 
address space that exists in the local memory of the physical machines that support the 
representation of the Web of Data. With physical memory, information is contained at an 
address. For a 64-bit machine, that information is a 64-bit word. That 64-bit word can 
be interpreted as a literal primitive (e.g. a byte, an integer, a floating point value) or yet 
another 64-bit address (i.e. a pointer). This is how address locations denote data and Unk 
to each other, respectively. On the Web of Data, a URI is simply an address as it does not 
contain content]^ It is through RDF that a URI address has content. For instance, with 
RDF, a URI can reference a literal (i.e. xsd:byte, xsd: integer, xsd: float) or 
another URI. Thus, RDF, as a data model, has many similarities to typical local memory. 
However, the benefit of URIs and RDF is that they create an inherently distributed and 
theoretically infinite space. Thus, the Web of Data can be interpreted as a large-scale, 
distributed memory structure. What is encoded and processed in that memory structure 
should not be dictated at the level of RDF, but instead dictated by the domains that leverage 
this medium for various application scenarios. The Web of Data should be realized as an 
application agnostic memory structure that supports a rich variety of uses: from Semantic 
Web reasoning, to Giant Global Graph analysis, to Web of Process execution. 

The intention of this article is to create a conceptual splinter that separates RDF from 
its legacy use as a logic language and demonstrate that it is more generally applicable when 
realized as only a data model. In this way, RDF as the foundational standard for the Web 
of Data makes the Web of Data useful to anyone wishing to represent information and 
compute in a global, distributed space. Three specific interpretations of the Web of Data are 
presented in order to elucidate the many ways in which the Web of Data is currently being 

^This is not completely true. Given that a URL is a subtype of a URI, and a URL can "contain" a file, it is 
possible for a URI to "contain" information. 
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used. Moreover, within these different presentations, various standards and technologies 
are discussed. These presentations are provided as summaries, not full descriptions. In 
short, this article is more of a survey of a very large and multi-domained landscape. The 
three interpretations that will be discussed are enumerated below. 

1. The Web of Data as a knowledge base (see 

• The Semantic Web is an interpretation of the Web of Data. 

• RDF is the means by which a model of a world is created. 

• There are many types of logic: logics of truth and logics of thought. 

• Scalable solutions exist for reasoning on the Web of Data. 

2. The Web of Data as a multi -relational network (see 

• The Giant Global Graph is an interpretation of the Web of Data. 

• RDF is the means by which vertices are connected together by labeled edges. 

• Single-relational network analysis algorithms can be applied to multi-relational 
networks. 

• Scalable solutions exist for network analysis on the Web of Data. 

3. The Web of Data as an object repository (see fQ. 

• The Web of Process is an interpretation of the Web of Data. 

• RDF is the means by which objects are represented and related to other objects. 

• An object's representation can include both its fields and its methods. 

• Scalable solutions exist for object-oriented computing on the Web of Data. 

The landscape presented in this article is by no means complete and only provides a glimpse 
into these different areas. Moreover, within each of these three presented interpretations, 
applications and use-cases are not provided. What is provided is a presentation of com- 
mon computing models that have been mapped to the Web of Data in order to take unique 
advantage of the Web as a computing infrastructure. 

2 A Distributed Knowledge Base 

The Web of Data can be interpreted as a distributed knowledge base — a Semantic Web. A 
knowledge base is composed of a set of statements about some "world". These statements 
are written in some language. Inference rules designed for that language can be used to 
derive new statements from existing statements. In other words, inference rules can be used 
to make explicit what is implicit. This process is called reasoning. The Semantic Web 
initiative is primarily concerned with this interpretation of the Web of Data. 

"For the Semantic Web to function, computers must have access to structured 
collections of information and sets of inference rules that they can use to con- 
duct automated reasoning." [8] 
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Currently, the Semantic Web interpretation of the Web of Data forces strict semantics on 
RDF. That is, RDF is not simply a data model, but a logic language. As a data model, it 
specifies how a statement r is constructed (i.e. r G (([/ UB) xU x[U UBUL))). As a logic 
language is species specific language constructs and semantics — a way of interpreting what 
statements mean. Because RDF was developed in concert with requirements provided by 
the knowledge representation and reasoning domain, RDF and the Semantic Web have been 
very strongly aligned for many years. This is perhaps the largest conceptual stronghold that 
exists as various W3C documents make this point explicit. 

"RDF is an assertional logic, in which each triple expresses a simple propo- 
sition. This imposes a fairly strict monotonic discipline on the language, so 
that it cannot express closed-world assumptions, local default preferences, and 
several other commonly used non-monotonic constructs." [29] 

RDF is monotonic in that any asserted statement r G W can not be made "false" by future 
assertions. In other words, the truth-value of a statement, once stated, does not change. 
RDF makes use of the open-world assumption in that if a statement is not asserted, this 
does not entail that it is "false". The open-world assumption is contrasted to the closed- 
world assumption found in many systems, where the lack of data is usually interpreted as 
that data being "false". 

From this semantic foundation, extended semantics for RDF have been defined. The 
two most prevalent language extensions are the RDF Schema (RDFS) [14] and the Web 
Ontology Language (OWL) [39]. It is perhaps this stack of standards that forms the most 
common conception of what the Semantic Web is. However, if the Semantic Web is to be 
just that, a "semantic web", then there should be a way to represent other languages with 
different semantics. If RDF is forced to be a monotonic, open-world language, then this im- 
mediately pigeonholes what can be represented on the Semantic Web. If RDF is interpreted 
strictly as a data model, devoid of semantics, then any other knowledge representation lan- 
guage can be represented in RDF and thus, contribute to the Semantic Web. This section 
will discuss three logic languages: RDFS, OWL, and the Non- Axiomatic Logic (NAL) [58]. 
RDFS and OWL are generally understood in the Semantic Web community as these are the 
primary logic languages used. However, NAL is a multi-valent, non-monotonic language 
that, if to be implemented in the Semantic Web, requires that RDF be interpreted as a data 
model, not as a logic language. Moreover, NAL is an attractive language for the Semantic 
Web because its reasoning process is inherently distributed, can handle conflicting incon- 
sistent data, and was designed on the assumption of insufficient knowledge and computing 
resources. 

2.1 RDF Schema 

RDFS is a simple language with a small set of inference rules [14]. In RDF, resources 
(e.g. URIs and blank nodes) maintain properties (i.e. rdf : Property). These properties 
are used to relate resources to other resources and literals. In RDFS, classes and proper- 
ties can be formally defined. Class definitions organize resources into abstract categories. 
Property definitions specify the way in which these resources are related to one another 
For example, it is possible to state there exist people and dogs (i.e. classes) and people have 
dogs as pets (i.e. a property). This is represented in RDFS in Figure [T] 
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rdfs:Class 



rdf:type 



rdf:type 



lanlPerson 



rdf:Property 



t 

rdfs:range 



t 

rdf:type 



lanl:Dog 



rdfsidomain 



lanl:pet 



Figure 1 : An RDFS ontology that states that a person has a dog as a pet. 



RDFS inference rules are used to derive new statements given existing statements that 
use the RDFS langauge. RDFS inference rules make use of statements with the following 
URIs: 

• rdf s : Class: denotes a class as opposed to an instance. 

• rdf : Property: denotes a property/role. 

• rdf s : domain: denotes what a property projects from. 

• rdfs:range: denotes what a property projects to. 

• rdf : type: denotes that an instance is a type of class. 

• rdfsisubClassOf: denotes that a class is a subclass of another. 

• rdf s : subPropertyOf: denotes that a property is a sub-property of another. 

• rdf s : Resource: denotes a generic resource. 

• rdf s : Datatype: denotes a literal primitive class. 

• rdf s : Literal: denotes a generic literal class. 

RDFS supports two general types of inference: subsumption and reaUzation. Subsumption 
determines which classes are a subclass of another. The RDFS inference rules that support 
subsumption are 

(?a;, rdf : type, rdf s : Class) =^ rdf s : subClassOf, rdf s : Resource), 
rdf : type, rdf s : Datatype) =^ rdf s : subClassOf , rdf s : Literal), 

{7x, rdf s : subPropertyOf, 7y) A {7y, rdf s : subPropertyOf, 7z) 

rdf s : subPropertyOf , ?z). 

and finally, 

rdf s : subClassOf , 7y) A (?y, rdf s : subClassOf , 7z) 

=^ rdf s : subClassOf , Tz). 

Thus, if both 
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( lanl : Chihuahua, rdf s : subClassOf , lanl:Dog) 
(lanl:Dog, rdf s : subClassOf , lanl : Mammal ) 

are asserted, then it can be inferred that 

{ lanl : Chihuahua, rdf s : subClassOf , lanl : Mammal ) . 

Next, reaUzation is used to determine if a resource is an instance of a class. The RDFS 
inference rules that support realization are 

(?x,?y,?2;) =^ (Tx, rdf : type, rdf s : Resource), 

{?x,?y,7z) =^> rdf : type, rdf : Property), 
(lx,ly,lz) =^ (?2:, rdf : type, rdf s : Resource), 
rdf : type, ?y) A (?y, rdf s : subClassOf , ?z) =^ rdf : type, Tz), 
(Tto, rdf s : domain, ?x) A (??/, ?U), Tz) =^ rdf : type, ?a;), 
and finally, 

rdf s : domain, ?x) A (?y, ?w, ?z) =^ (Tz, rdf : type, ?x). 

Thus if, along with the statements in Figure [TJ 

(lanlimarko, lanl:pet, lanl:fluffy) 

is asserted, then it can be inferred that 

{lanl:marko, rdf:type, lanl:Person) 
( lanl : fluff y, rdf:type, lanl:Dog). 

Given a knowledge base containing statements, these inference rules continue to exe- 
cute until they no longer produce novel statements. It is the purpose of an RDFS reasoner 
to efficiently execute these rules. There are two primary ways in which inference rules are 
executed: at insert time and at query time. With respect to insert time, if a statement is 
inserted (i.e. asserted) into the knowledge base, then the RDFS inference rules execute to 
determine what is entailed by this new statement. These newly entailed statements are then 
inserted in the knowledge base and the process continues. While this approach ensures fast 
query times (as all entailments are guaranteed to exist at query time), it greatly increases the 
number of statements generated. For instance, given a deep class hierarchy, if a resource 
is a type of one of the leaf classes, then it asserted that it is a type of all the super classes 
of that leaf class. In order to alleviate the issue of "statement bloat," inference can instead 
occur at query time. When a query is executed, the reasoner determines what other implicit 
statements should be returned with the query. The benefits and drawbacks of each approach 
are benchmarked, like much of computing, according to space vs. time. 



8 



Marko A. Rodriguez 



2.2 Web Ontology Language 

OWL is a more complicated language which extends RDFS by providing more expressive 
constructs for defining classes [39] . Moreover, beyond subsumption and realization, OWL 
provides inference rules to determine class and instance equivalence. There are many OWL 
specific inference rules. In order to give the flavor of OWL, without going into the many 
specifics, this subsection will only present some examples of the more commonly used 
constructs. For a fine, in depth review of OWL, please refer to [36]. 

Perhaps the most widely used language URI in OWL is owl : Restriction. In 
RDFS, a property can only have a domain and a range. In OWL, a class can apply the 
following restrictions to a property: 

• owl : cardinality 

• owl : minCardinality 

• owl : maxCardinality 

• owl : hasValue 

• owl : allValuesFrom 

• owl : someValuesFrom 

Cardinality restrictions are used to determine equivalence and inconsistency. For example, 
in an OWL ontology, it is possible to state that a country can only have one president. This 
is expressed in OWL as diagrammed in Figure [2] The _ : 1 2 3 4 resource is a blank node that 
denotes a restriction on the country class's lanl : president property. 



owl:Restriction 



rdfs:subClassOf 



:1234 



rdfs:subClassOf 

I 



owl:maxCardinality 



owl:onProperty 




lanl:Country rdfs:domain 



P- "1"'^'^xsd:int 




Figure 2: An OWL ontology that states that the president of a country is a person and there 
can be at most one president for a country. 



Next, if usa:barack and usa : obama ai"e both asserted to be the president of the 
United States with the statements 

(usa:barack, lanl : president , usa : United_States ) 
(usa:obama, lanl : president , usa : United_States ) , 
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then it can be inferred (according to OWL rules) that these resources are equivalent. 
This equivalence relationship is made possible because the maximum cardinality of the 
lanl : president property of a country is 1. Therefore, if there are "two" people that 
are president, then they must be the same person. This is made explicit when the reasoner 
asserts the statements 

{usa:barack, owl:sameAs, usa:obama) 
{usa:obama, owl:sameAs, usa:barack). 

Next, if lanl : herbertv is asserted to be different from usa : barack (which, from 
previous, was asserted to be the same as usa:obama) and lanl : herbertv is also 
asserted to be the president of the United States, then an inconsistency is detected. Thus, 
given the ontology asserted in Figure [2] and the previous assertions, asserting 

( lanl : herbertv, owl : dif f erentFrom, usa:barack) 
(lanl : herbertv, lanl : president , usa : United_States ) 

causes an inconsistency. This inconsistency is due to the fact that a country can only have 
one president and lanl : herbertv is not usa : barack. 
Two other useful language URIs for properties in OWL are 

• owl : SymmetricProperty 

• owl : TransitiveProperty 

In short, if y is symmetric, then if (x, y, z) is asserted, then (z, y, x) can be inferred. Next, 
if the property y is transitive, then if {w, y, x) and (x, y, z) are asserted then, {w, y, z) can 
be inferred. 

There are various reasoners that exist for the OWL language. A popular OWL reasoner 
is Pellet [44]. The purpose of Pellet is to execute the OWL rules given existing statements in 
the knowledge base. For many large-scale knowledge base applications (i.e. triple- or quad- 
stores), the application provides its own reasoner. Popular knowledge bases that make use 
of the OWL language are OWLim [34], Oracle Spatial [3], and AllegroGraph [1]. It is 
noted that due to the complexity (in terms of implementation and running times), many 
knowledge base reasoners only execute subsets of the OWL language. For instance, Al- 
legroGraph's reasoner is called RDFS-i-i- as it implements all of the RDFS rules and only 
some of the OWL rules. However, it is also noted that RacerPro [26] can be used with 
AllegroGraph to accomplish complete OWL reasoning. Finally, OpenSesame [16] can be 
used for RDFS reasoning. Because OpenSesame is both a knowledge base and an API, 
knowledge base applications that implement the OpenSesame interfaces can automatically 
leverage the OpenSesame RDFS reasoner; though there may be speed issues as the reasoner 
is not natively designed for that knowledge base application. 

2.3 Non- Axiomatic Logic 

If RDF is strictly considered a monotonic, open-world logic language, then the Semantic 
Web is solidified as an open-world, monotonic logic environment. If reasoning is restricted 
to the legacy semantics of RDF, then it will become more difficult to reason on the Semantic 
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Web as it grows in size and as more inconsistent knowledge is introduced. With the number 
of statements of the Semantic Web, computational hurdles are met when reasoning with 
RDFS and OWL. With inconsistent statements on the Semantic Web, it is difficult to reason 
as inconsistencies are not handled gracefully in RDFS or OWL. In general, sound and 
complete reasoning will not be feasible as the Semantic Web continues to grow. In order 
to meet these challenges, the Large Knowledge ColUder project (LarKC) is focused on 
developing a reasoning platform to handle incomplete and inconsistent data [21]. 

"Researchers have developed methods for reasoning in rather small, closed, 
trustworthy, consistent, and static domains. They usually provide a small set 
of axioms and facts. [OWL] reasoners can deal with 10^ axioms (concept 
definitions), but they scale poorly for large instance sets. [...] There is a deep 
mismatch between reasoning on a Web scale and efficient reasoning algorithms 
over restricted subsets of first-order logic. This is rooted in underlying assump- 
tions of current systems for computational logic: small set of axioms, small 
number of facts, completeness of inference, correctness of inference rules and 
consistency, and static domains." [21] 

There is a need for practical methods to reason on the Semantic Web. One promising 
logic was founded on the assumption of insufficient knowledge and resources. This logic 
is called the Non- Axiomatic Logic (NAL) [57]. Unfortunately for the Semantic Web as 
it is now, NAL breaks the assumptions of RDF semantics as NAL is multi-valent, non- 
monotonic, and makes use of statements with a subject-predicate form. However, if RDF is 
considered simply a data model, then it is possible to represent NAL statements and make 
use of its efficient, distributed reasoning system. Again, for the massive-scale, inconsistent 
world of the Semantic Web, sound and complete approaches are simply becoming more 
unreasonable. 

2.3.1 Language 

There are currently 8 NAL languages. Each language, from NAL-0 to NAL-8, builds on 
the constructs of the previous in order to support more complex statements. The following 
list itemizes the various languages and what can be expressed in each. 

• NAL-0: binary inheritance. 

• NAL-1: inference rules. 

• NAL-2: sets and variants of inheritance. 

• NAL-3: intersections and differences 

• NAL-4: products, images, and ordinary relations. 

• NAL-5: statement reification. 

• NAL-6: variables. 

• NAL-7: temporal statements. 

• NAL-8: procedural statements. 
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Every NAL language is based on a simple inheritance relationship. For example, in 
NAL-0, which assumes all statements are binary, 

lanl :marko lanl :Person 

states that Marko (subject) inherits (^) from person (predicate). Given that all subjects and 
predicates are joined by inheritance, there is no need to represent the copula when formally 
representing a statement}^ If RDF, as a data model, is to represent NAL, then one possible 
representation for the above statement is 

(lanl :marko, lanl:1234, lanl : Person) , 

where lanl : 12 34 serves as a statement pointer. This pointer could be, for example, a 
128-bit Universally Unique Identifier (UUID) [37]. It is important to maintain a statement 
pointer as beyond NAL-0, statements are not simply "true" or "false". A statement's truth 
is not defined by its existence, but instead by extra numeric metadata associated with the 
statement. NAL maintains an 

"experience-grounded semantics [where] the truth value of a judgment indi- 
cates the degree to which the judgment is supported by the system's experi- 
ence. Defined in this way, truth value is system-dependent and time-dependent. 
Different systems may have conflicting opinions, due to their different experi- 
ences." [58] 

A statement has a particular truth value associated with it that is defined as the frequency of 
supporting evidence (denoted / S [0,1]) and the confidence in the stability of that frequency 
(denoted c G [0,1]). For example, beyond NAL-0, the statement "Marko is a person" is not 
"100% true" simply because it exists. Instead, every time that aspects of Marko coincide 
with aspects of person, then / increases. Likewise, every time aspects of Marko do not 
coincide with aspects of person, / decreases]^ Thus, NAL is non-monotonic as its statement 
evidence can increase and decrease. To demonstrate / and c, the above "Marko is a person" 
statement can be represented in NAL-1 as 

lanl : marko ^lanl:Person <0.9,0.8>, 

where, for the sake of this example, / = 0.9 and c = 0.8. In an RDF representation, this 
can be denoted 

(lanl:marko, lanl:1234, lanl:Person) 
{lanl:1234, nal : frequency , " . 9 " " 'xsd : f loat ) 
{lanl:1234, nal : confidence, " . 8 " " "xsd : f loat ) , 

^This is not completely true as different types of inheritance are defined in NAL-2 such as instance o-;-, 
property — >o, and instance-property o^o inheritance. However, these 3 types of inheritance can also be rep- 
resented using the basic —> inheritance. Moreover, the RDF representation presented can support the explicit 
representation of other inheritance relationships if desired. 

''The idea of "aspects coinciding" is formally defined in NAL, but is not discussed here for the sake of 
brevity. In short, a statement's / is modulated by both the system's "external" experiences and "internal" 
reasoning — both create new evidence. See [60] for an in depth explanation. 
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where the lanl:1234 serves as a statement pointer allowing NAL's nal: frequency 
and nal: confidence constructs to reference the inheritance statement. 

NAL-4 supports statements that are more analogous to the subject-object-predicate form 
of RDF. If Marko is denoted by the URl lanl : marko, Alberto by the URI ucla : apepe, 
and friendship by the URI lanl: friend, then in NAL-4, the statement "Alberto is a 
friend of Marko" is denoted in RDF as 

(ucla:apepe, lanl:friend, lanl:marko). 

In NAL-4 this is represented as 

(ucla : apepe x lanl : marko) ^lanl:friend <0.8, 0.5>, 

where / = 0.8 and c = 0.5 are provided for the sake of the example. This statement states 
that the set (ucla : apepe, lanl : marko) inherits the property of friendship to a certain 
degree and stability as defined by / and c, respectively. The RDF representation of this 
NAL-4 construct can be denoted 

(lanl:2345, nal:_l, ucla:pepe) 

(lanl:2345, nal:_2, lanl:marko) 

(lanl:2345, lanl:3456, lanl:friend) 

(lanl:3456, nal : frequency , "0 . 8" " "xsd: f loat) 

(lanl:3456, nal : confidence, "0 . 5" " 'xsd: f loat) . 

In the triples above, lanl:2345 serves as an set and thus, this set inherits from friendship. 
That is, Alberto and Marko inherit the property of friendship. 

2.3.2 Reasoning 

"In traditional logic, a 'valid' or 'sound' inference rule is one that never derives 
a. false conclusion (that is, it will be contradicted by the future experience of the 
system) from true premises [19]. [In NAL], a 'valid conclusion' is one that is 
most consistent with the evidence in the past experience, and a 'valid inference 
rule' is one whose conclusions are supported by the premises used to derive 
them." [60] 

Given that NAL is predicated on insufficient knowledge, there is no guarantee that reasoning 
will produce "true" knowledge with respect to the world that the statements are modeling as 
only a subset of that world is ever known. However, this does not mean that NAL reasoning 
is random, instead, it is consistent with respect to what the system knows. In other words, 

"the traditional definition of vaUdity of inference rules — ^that is to get true con- 
clusions from true premises — no longer makes sense in [NAL]. With insuffi- 
cient knowledge and resources, even if the premises are true with respect to the 
past experience of the system there is no way to get infallible predictions about 
the future experience of the system even though the premises themselves may 
be challenged by new evidence." [58] 
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The inference rules in NAL are all syllogistic in that they are based on statements shar- 
ing similar terms (i.e. URIs) [45]. The typical inference rule in NAL has the following 
form 

(n < /i, ci > At2 < /2, C2 >) I- ra < /s, C3 >, 

where ri and T2 are statements that share a common term. There are four standard syllo- 
gisms used in NAL reasoning. These are enumerated below. 

1. deduction: {x ^ y < fi,ci > A y ^ z < /2, C2 >) K a; ^ z < /s, C3 >. 

2. induction: {x ^ y < fi,ci > A 2: ^ y < /2, C2 >) K x ^ z < /a, C3 >. 

3. abduction: {x ^ y < fi,ci > A x ^ z < f2,C2 >) ^ y ^ z < h,C3 >. 

4. exemplification: {x ^ y < fi,ci > /\ y ^ z < f2,C2 >) ^ z ^ a; < /s, C3 >. 

Two other important inference rule not discussed here are choice (i.e. what to do with 
contradictory evidence) and revision (i.e. how to update existing evidence with new evi- 
dence). Each of the inference rules have a different formulas for deriving < /s, C3 > from 
< /i) ci > and < /2, C2 >j^ These formulas are enumerated below. 

1. deduction: /a = /1/2 and C3 = /1C1/2C2. 

2. induction: fs = fi and C3 = j/ei'c'a+fc - 

3. abduction: /3 = /2 and C3 = f^cX+k - 

4. exemplification: /3 = 1 and C3 — /2C1/2C2 



/lCl/2C2+fc' 

The variable k £ is a system specific parameter used in the determination of confidence. 
To demonstrate deduction, suppose the two statements 

lanl : marko lanl:Person < 0.5, 0.5 > 

lanl : Person lanl :Mammal < 0.9, 0.9 > . 
Given these two statements and the inference rule for deduction, it is possible to infer 

lanl : marko lanl : Mammal < 0.45, 0.2025 > . 

Next suppose the statement 

lanl : Dog lanl :Mammal < 0.9, 0.9 > . 

Given the existing statements, induction, and a A; = 1, it is possible to infer 

lanl : marko lanl : Dog < 0.45, 0.0758 > . 

Thus, while the system is not confident, according to all that the system knows, Marko is 
a type of dog. This is because there are aspects of Marko that coincide with aspects of 
dog — they are both mammals. However, future evidence, such as fur, four legs, sloppy 



Note that when the entailed statement aheady exists, its < /a, C3 > component is revised according to the 
revision rule. Revision is not discussed in this article. 
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tongue, etc. will be further evidence that Marko and dog do not coincide and thus, the / of 
lanl : marko — > lanl : Dog will decrease. 

The significance of NAL reasoning is that all inference is based on local areas of the 
knowledge base. That is, all inference requires only two degrees of separation from the 
resource being inferred on. Moreover, reasoning is constrained by available computational 
resources, not by a requirement for logical completeness. Because of these two proper- 
ties, the implemented reasoning system is inherently distributed and when computational 
resources are not available, the system does not break, it simply yields less conclusions. For 
the Semantic Web, it may be best to adopt a logic that is better able to take advantage of 
its size and inconsistency. With a reasoner that is distributable and functions under variable 
computational resources, and by making use of a language that is non-monotonic and sup- 
ports degrees of "truth", NAL may serve as a more practical logic for the Semantic Web. 
However, this is only possible if the RDF data model is separated from the RDF semantics 
and NAL's subject-predicate form can be legally represented. 

There are many other language constructs in NAL that are not discussed here. For 
an in depth review of NAL, please refer to the defacto reference at [60]. Moreover, for a 
fine discussion of the difference between logics of truth (i.e. mathematical logic — modern 
predicate logic) and logics of thought (i.e. cognitive logic — ^NAL), see [59]. 



3 A Distributed Multi-Relational Network 

The Web of Data can be interpreted as a distributed multi-relational network — a Giant 
Global Graphg A mutli-relational network denotes a set of vertices (i.e. nodes) that are 
connected to one another by set of labeled edges (i.e. typed links)j^ In the graph and net- 
work theory community, the multi -relational network is less prevalent. The more commonly 
used network data structure is the single-relational network, where all edges are of the same 
type and thus, there is no need to label edges. Unfortunately, most network algorithms have 
been developed for the single-relational network data structure. However, it is possible to 
port all known single-relational network algorithms over to the multi-relational domain. In 
doing so, it is possible to leverage these algorithms on the Giant Global Graph. The purpose 
of this section is to 



1. formalize the single-relational network (see ^ 3.1 1, 

2. formalize the multi -relational network (see ^3.2i, 

3. present a collection of common single-relational network algorithms (see ^3.3 1, and 
then finally, 

4. present a method for porting all known single-relational network algorithms over to 



the multi-relational domain (see ^ 3.4 1. 



Network algorithms are useful in many respects and have been generally applied to 
analysis and querying. If the network models an aspect of the world, then network analysis 



*The term "graph" is used in tlie matliematical domain of graph theory and the term "network" is used 
primarily in the physics and computer science domain of network theory. In this chapter, both terms are used 
depending on their source. Moreover, with regard to this article, these two terms are deemed synonymous with 
each other. 

'a multi-relational network is also known as a directed labeled graph or semantic network. 
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techniques can be used to elucidate general structural properties of the network and thus, the 
world. Moreover, network query algorithms have been developed for searching and ranking. 
When these algorithms can be effectively and efficiently applied to the Giant Global Graph, 
the Giant Global Graph can serve as a medium for network analysis and query. 

3.1 Single-Relational Networks 

The single-relational network represents a set of vertices that are related to one another by 
a homogenous set of edges. For instance, in a single-relational coauthorship network, all 
vertices denote authors and all edges denote a coauthoring relationship. Coauthorship exists 
between two authors if they have both written an article together. Moreover, coauthorship 
is symmetric — if person x coauthored with person y, then person y has coauthored with 
person x. In general, these types of symmetric networks are known as undirected, single- 
relational networks and can be denoted 

G' = {V,EC{V X V}), 

where V is the set of vertices and E is the set of undirected edges. The edge {i,j} G E 
states that vertex i and j are connected to each other. Figure [3] diagrams an undirected 
coauthorship edge between two author vertices. 



lanl:marko 



lanl:coauthor 



rpi;josh 



Figure 3: An undirected edge between two authors in an undirected single-relational net- 
work. 



Single-relational networks can also be directed. For instance, in a single-relational 
citation network, the set of vertices denote articles and the set of edges denote citations 
between the articles. In this scenario, the edges are not symmetric as one article citing 
another does not imply that the cited article cites the citing article. Directed single-relational 
networks can be denoted 

G = {V,EC{V X V)), 

where (i, j) G E states that vertex i is connected to vertex j. Figure |4] diagrams a directed 
citation edge between two article vertices. 



aaaiievidence 



lanl:cites 



joi:path_algebra 



Figure 4: A directed edge between two articles in a directed single-relational network. 



Both undirected and directed single-relational networks have a convenient matrix rep- 
resentation. This matrix is known as an adjacency matrix and is denoted 



1 ii{iJ)£E 
otherwise. 



16 



Marko A. Rodriguez 



where A G {0, Ijl^l^l^l. If Aij = 1, then vertex i is adjacent (i.e. connected) to vertex 
j. It is important to note that there exists an information-preserving, bijective mapping be- 
tween the set-theoretic and matrix representations of a network. Throughout the remainder 
of this section, depending on the algorithm presented, one or the other form of a network 
is used. Finally, note that the remainder of this section is primarily concerned with di- 
rected networks as a directed network can model an undirected network. In other words, 
the undirected edge {i,j} can be represented as the two directed edges (i, j) and (j, i). 



3.2 Multi-Relational Networks 

The multi-relational network is a more complicated structure that can be used to represent 
multiple types of relationships between vertices. For instance, it is possible to not only 
represent researchers, but also their articles in a network of edges that represent authorship, 
citation, etc. A directed multi-relational network can be denoted 

M = {V,E = {Eo,Eu...,Err,<Z{V xV)}), 

where E is a family of edge sets such that any G E : 1 < A: < m is a set of edges 
with a particular meaning (e.g. authorship, citation, etc.). A multi-relational network can 
be interpreted as a collection of single -relational networks that all share the same vertex 
set. Another representation of a multi-relational network is similar to the one commonly 
employed to define an RDF graph. This representation is denoted 

M' c (y X 17 X V), 

where Q is the set of edge labels. In this representation if i,j G V and k £ 0,, then the 
triple (z, k, j) states that vertex i is connected to vertex j by the relationship type k. 

Figure [5] diagrams multiple relationship types between scholars and articles in a multi- 
relational network. 



lanl:marko 



rpiijosh 



lanl:authored 



lanliauthored 



lanl:authored 

I 



aaai:evidence 



lanl:cites 



joi:path_algebra 



Figure 5: Multiple types of edges between articles and scholars in a directed multi- 
relational network. 



Like the single-relational network and its accompanying adjacency matrix, the multi- 
relational network has a convenient 3-way tensor representation. This 3-way tensor is de- 
noted 

^fc^fl iHhj)^Ek:l<k<m 
I otherwise. 
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This representation can be interpreted as a collection of adjacency matrix "slices", where 
each slice is a particular edge type. In other words, if A^j = 1, then {i, k,j) G M'. Like 
the relationship between the set-theoretic and matrix forms of a single-relational network, 
M, M', and A can all be mapped onto one another without loss of information. Each 
representation will be used depending on the usefulness of its form with respect to the idea 
being expressed. 

On the Giant Global Graph, RDF serves as the specification for graphing resources. 
Vertices are denoted by URIs, blank nodes, and literals and the edge labels are denoted by 
URIs. Multi-relational network algorithms can be used to exploit the Giant Global Graph. 
However, there are few algorithms dedicated specifically to multi-relational networks. Most 
network algorithms have been designed for single-relational networks. The remainder of 
this section will discuss some of the more popular single-relational network algorithms 
and then present a method for porting these algorithms (as well as other single-relational 
network algorithms) over to the multi-relational domain. This section concludes with a 
distributable and scalable method for executing network algorithms on the Giant Global 
Graph. 



3.3 Single-Relational Network Algorithms 



The design and study of graph and network algorithms is conducted primarily by mathe- 
maticians (graph theory) [17], physicists and computer scientists (network theory) [12], and 
social scientists (social network analysis) [61]. Many of the algorithms developed in these 
domains can be used together and form the general-purpose "toolkit" for researchers do- 
ing network analysis and for engineers developing network-based services. The following 
itemized list presents a collection of the single -relational network algorithms that will be 
reviewed in this subsection. As denoted with its name in the itemization, each algorithm 
can be used to identify properties of vertices, paths, or the network. Vertex metrics assign 
a real value to a vertex. Path metrics assign a real value to a path. And finally, network 
metrics assign a real value to the network as a whole. 



shortest path: path metric (j: 3.3.1 1 



eccentricity: vertex metric (^3.3.2 1 



radius: network metric (^3.3.2) 



diameter: network metric ({ 3.3.2 1 



• closeness: vertex metric (^3.3.3 1 



betweenness: vertex metric {{ 3.3.3 1 



stationary probability distribution: vertex metric {{ 3.3.4 1 



PageRank: vertex metric {{ 3.3.5 I 



spreading activation: vertex metric (^3.3.6 1 



assortative mixing: network metric {{ 3.3.7 1 



A simple intuitive approach to determine the appropriate algorithm to use for an appli- 
cation scenario is presented in [35]. In short, various factors come into play when selecting 
a network algorithm such as the topological features of the network (e.g. its connectivity 
and its size), the computational requirements of the algorithms (e.g. its complexity), the 
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type of results that are desired (e.g. personalized or global), and the meaning of the algo- 
rithm's result (e.g. geodesic-based, flow-based, etc.). The following sections will point out 
which features describe the presented algorithms. 

3.3.1 Shortest Path 

The shortest path metric is the foundation of all other geodesic metrics. The other geodesic 
metrics discussed are eccentricity, radius, diameter, closeness, and betweenness. A shortest 
path is defined for any two vertices i, j G V such that the sink vertex j is reachable from 
the source vertex i. If j is unreachable from i, then the shortest path between i and j is 
undefined. Thus, for geodesic metrics, it is important to only considered strongly connected 
networks, or strongly connected components of a network]^ The shortest path between any 
two vertices i and j in a single -relational network is the smallest of the set of all paths 
between i and j. If p : V x V ^ Q is a function that takes two vertices and returns the 
set of all paths Q where for any q ^ Q, q = {i, . . . then the length of the shortest path 
between i and j is rniniy^^i^Q \q\ — 1), where min returns the smallest value of its domain. 
The shortest path function is denoted s : F x 1/ ^ N with the function rule 



There are many algorithms to determine the shortest path between vertices in a net- 
work. Dijkstra's method is perhaps the most popular as it is the typical algorithm taught in 
introductory algorithms classes [20]. However, if the network is unweighted, then a simple 
breadth-first search is a more efficient way to determine the shortest path between i and 
j. Starting from i a "fan-out" search for j is executed where at each time step, adjacent 
vertices are traversed to. The first path that reaches j is the shortest path from i to j. 

3.3.2 Eccentricity, Radius, and Diameter 

The radius and diameter of a network require the determination of the eccentricity of every 
vertex in V. The eccentricity of a vertex i is the largest shortest path between i and all other 
vertices in V such that the eccentricity function e : F ^ N has the rule 



where max returns the largest value of its domain [28]. The eccentricity metric calculates 
|y| — 1 shortest paths of a particular vertex. 

The radius of the network is the minimum eccentricity of all vertices in V [61]. The 
function r : G ^ N has the rule 



'^Do not confuse a strongly connected network with a fully connected network. A fully connected network 
is where every vertex is connected to every other vertex directly. A strongly connected network is where every 
vertex is connected to every other vertex indirectly (i.e. there exists a path from any i to any j). 
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Finally, the diameter of a network is the maximum eccentricity of the vertices in 1/ [61]. 
The function d : G has the rule 



d{G) = max 




The diameter of a network is, in some cases, telling of the growth properties of the 
network (i.e. the general principle by which new vertices and edges are added). For instance, 
if the network is randomly generated (edges are randomly assigned between vertices), then 
the diameter of the network is much larger then if the network is generated according to 
a more "natural growth" function such as a preferential attachment model, where highly 
connected vertices tend to get more edges (colloquially captured by the phrase "the rich 
get richer") [11]. Thus, in general, natural networks tend to have a much smaller diameter. 
This was evinced by an empirical study of the World Wide Web citation network, where the 
diameter of the network was concluded to be only 19 [2]. 



3.3.3 Closeness and Betweenness Centrality 

Closeness and betweenness centrality are popular network metrics for determining the "cen- 
ttalness" of a vertex and have been used in sociology [61], bioinformatics [43], and biblio- 
metrics [10]. Centrality is a loose term that describes the intuitive notion that some vertices 
are more connected/integral/central/influential within the network than others. Closeness 
centrality is one such centrality measure and is defined as the mean shortest path between 
some vertex i and all the other vertices in V [5, 38, 52]. The function c : V ^ R has the 
rule ^ 

Betweenness centrality is defined for a vertex in V [13,23]. The betweenness ofieV 
is the number of shortest paths that exist between all vertices j,k E V that have i in their 
path divided by the total number of shortest paths between j and k, where i ^ j ^ k. 
If a : V X V ^ Q h the function that returns the set of shortest paths between any two 
vertices j and k such that 

and a : y X V" X F — >^ Q is the set of shortest paths between two vertices j and k that have 
i in the path, where 

^{h k,i)= [j q- {\q\ - 1 = s{j, k) A i e q), 
then the betweenness function 6 : F — ^ M has the rule 
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There are many variations to the standard representations presented above. For a more 
in depth review on these metrics, see [61] and [12]. Finally, centraUty is not restricted only 
to geodesic metrics. The next three algorithms are centrality metrics based on random walks 
or "flows" through a network. 

3.3.4 Stationary Probability Distribution 

A Markov chain is used to model the states of a system and the probability of transition 
between states [27]. A Markov chain is best represented by a probabilistic, single-relational 
network where the states are vertices, the edges are transitions, and the edge weights denote 
the probability of transition. A probabihstic, single-relational network can be denoted 



where a; is a function that maps each edge in to a probability value. The outgoing edges 
of any vertex form a probability distribution that sums to 1.0. In this section, all outgoing 
probabilities from a particular vertex are assumed to be equal. Thus, Vj, k G r+ (i) : 
io{i, j) = Lo{i, k), where r+(i) C y is the set of vertices adjacent to i. 

A random walker is a useful way to visualize the transitioning between vertices. A ran- 
dom walker is a discrete element that exists at a particular i eV ata particular point in time 
t G N"*". If the vertex at time t is i then the next vertex at time t + 1 will be one of the ver- 
tices adjacent to i in r+(i). In this manner, the random walker makes a probabilistic jump 
to a new vertex at every time step. As time t goes to infinity a unique stationary probability 
distribution emerges if and only if the network is aperiodic and strongly connected. The 
stationary probability distribution expresses the probability that the random walker will be 
at a particular vertex in the network. In matrix form, the stationary probability distribution 
is represented by a row vector tt G [0, 1]'^', where ttj is the probabiUty that the random 
walker is at i and X^jgy vTj = 1.0. If the network is represented by the row-stochastic 
adjacency matrix 



and if the network is aperiodic and strongly connected, then there exists some tt such that 
ttA = TT. Thus, the stationary probability distribution is the primary eigenvector of A. 
The primary eigenvector of a network is useful in ranking its vertices as those vertices that 
are more central are those that have a higher probabiUty in tt. Thus, intuitively, where the 
random walker is likely to be is a indicator of how central the vertex is. However, if the 
network is not strongly connected (very likely for most natural networks), then a stationary 
probability distribution does not exist. 

3.3.5 PageRank 

PageRank makes use of the random walker model previously presented [15]. However, 
in PageRank, the random walker does not simply traverse the single-relational network by 
moving between adjacent vertices, but instead has a probability of jumping, or "teleport- 
ing", to some random vertex in the network. In some instances, the random walker will 



G" = {V,EQ{VxV),Lo:E^ [0, 1]) 
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follow an outgoing edge from its current vertex location. In other instances, the random 
walker will jump to some other random vertex in the network that is not necessarily adja- 
cent to it. The benefit of this model is that it ensures that the network is strongly connected 
and aperiodic and thus, there exists a stationary probability distribution. In order to calculate 
PageRank, two networks are used. The standard single-relational network is represented as 
the row-stochastic adjacency matrix 



Any i E V where r+(i) = is called a "rank-sink". Rank-sinks ensure that the network is 
not strongly connected. To rectify this connectivity problem, all vertices that are rank-sinks 
are connected to every other vertex with probability Next, for teleportation, a fully 

connected network is created that is denoted Bj ^ = . 

The random walker will choose to use A or B at time step t as its transition network 
depending on the probability value a G (0, 1], where in practice, a = 0.85. This means that 
85% of the time the random walker will use the edges in A to traverse, and the other 15% of 
the time, the random walker will use the edges in B. The a-biased union of the networks A 
and B guarantees that the random walker is traversing an strongly connected and aperiodic 
network. The random walker's traversal network can be expressed by the matrix 



The PageRank row vector tt G [0, has the property ttC = tt. Thus, the PageRank 
vector is the primary eigenvector of the modified single-relational network. Moreover, tt is 
the stationary probability distribution of C. From a certain perspective, the primary contri- 
bution of the PageRank algorithm is not in the way it is calculated, but in how the network 
is modified to support a convergence to a stationary probability distribution. PageRank has 
been popularized by the Google search engine and has been used as a ranking algorithm in 
various domains. Relative to the geodesic centraUty algorithms presented previous, PageR- 
ank is a more efficient way to determine a centrality score for all vertices in a network. 
However, calculating the stationary probability distribution of a network is not cheap and 
for large networks, can not be accompUshed in real-time. Local rank algorithms are more 
useful for real-time results in large-scale networks such as the Giant Global Graph. 

3.3.6 Spreading Activation 

Both the stationary probability distribution and PageRank are global rank metrics. That is, 
they rank all vertices relative to all vertices and as such, require a full network perspective. 
However, for many applications, a local rank metric is desired. Local rank metrics rank 
a subset of vertices relative to some set of source vertices. Local rank metrics have the 
benefit of being faster to compute and being relative to a particular area of the network. For 
large-scale networks, local rank metrics are generally more practical for real-time queries. 

Perhaps the most popular local rank metric is spreading activation. Spreading activation 
is a network analysis technique that was inspired by the spreading activation potential found 
in biological neural networks [4, 18, 30]. This algorithm (and its many variants) has been 




C = aA + {l-a)B. 
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used extensively in semantic network reasoning and recommender systems. The purpose 
of the algorithm is to expose, in a computationally efficient manner, those vertices which 
are closest (in terms of a flow distance) to a particular set of vertices. For example, given 
i, j, k ^ V,if there exists many short recurrent paths between vertex i and vertex j and not 
so between i and k, then it can be assumed that vertex i is more "similar" to vertex j than k. 
Thus, the returned ranking will rank j higher than k relative to i. In order to calculate this 
distance, "energy" is assigned to vertex i. Let x G [0, l]'^' denote the energy vector, where 
at the first time step all energy is at i such that xj = 1.0. The energy vector is propagated 
over A for £ G N"*" number of steps by the equation x*"*"^ = x*A : t + 1 < i. Moreover, 
at every time step, x is decayed some amount by G [0, 1]. At the end of the process, the 
vertex that had the most energy flow through it (as recorded by vr G M'^I) is considered 
the vertex that is most related to vertex i. Algorithm [T] presents this spreading activation 
algorithm. The resultant vr provides a ranking of all vertices at most t steps away from i. 



begin 

t= 1 

while t <tdo 

TT = IT + X 

X = {6x)A 
t = t + l 

end 

return vr 
end 

Algoritlim 1: A spreading activation algorithm. 

A class of algorithms known as "priors" algorithms perform computations similar to the 
local rank spreading activation algorithm, but do so using a stationary probability distribu- 
tion [62]. Much like the PageRank algorithm distorts the original network, priors algorithms 
distort the local neighborhood of the graph and require at every time step, with some prob- 
ability, that all random walkers return to their source vertex. The long run behavior of such 
systems yield a ranking biased towards (or relative to) the source vertices and thus, can be 
characterized as local rank metrics. 

3.3.7 Assortative Mixing 

The final single-relational network algorithm discussed is assortative mixing. Assortative 
mixing is a network metric that determines if a network is assortative (colloquially captured 
by the phrase "birds of a feather flock together"), disassortative (colloquially captured by 
the phrase "opposites attract"), or uncorrelated. An assortative mixing algorithm returns 
values in [—1, 1], where 1 is assortative, —1 is disassortative, and is uncorrelated. Given a 
collection of vertices and metadata about each vertex, it is possible to determine the assor- 
tative mixing of the network. There are two assortative mixing algorithms: one for scalar 
or numeric metadata (e.g. age, weight, etc.) and one for nominal or categorical metadata 
(e.g. occupation, sex, etc.). In general, an assortative mixing algorithm can be used to 
answer questions such as: 

• Do friends in a social network tend to be the same age? 
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• Do colleagues in a coauthorship network tend to be from the same university? 

• Do relatives in a kinship network tend to like the same foods? 

Note that to calculate the assortative mixing of a network, vertices must have metadata prop- 
erties. The typical single -relational network G = {V, E) does not capture this information. 
Therefore, assume some other data structure that stores metadata about each vertex. 

The original publication defining the assortative mixing metric for scalar properties used 
the parametric Pearson correlation of two vectors [40] One vector is the scalar value of 
the vertex property for the vertices on the tail of all edges. The other vector is the scalar 
value of the vertex property for the vertices on the head of all the edges. Thus, the length 
of both vectors is (i.e. the total number of edges in the network). Formally, the Pearson 
correlation-based assortativity is defined as 

\E\Y.^3ih-Y.i3^Y.ih 



where ji is the scalar value of the vertex on the tail of edge i, and ki is the scalar value of 
the vertex on the head of edge i. For nominal metadata, the equation 

^ _ Ep ^pp ~ Ep ^p^p 
1 - Ep flp^p 

yields a value in [—1, 1] as well, where Cpp is the number of edges in the network that have 
property value p on both ends, is the number of edges in the network that have property 
value p on their tail vertex, and hp is the number of edges that have property value -p on their 
head vertex [41]. 



3.4 Porting Algorithms to the Multi-Relational Domain 

All the aforementioned algorithms are intended for single-relational networks. However, it 
is possible to map these algorithms over to the multi-relational domain and thus, apply them 
to the Giant Global Graph. In the most simple form, it is possible to ignore edge labels and 
simply treat all edges in a multi-relational network as being equal. This method, of course, 
does not take advantage of the rich structured data that multi-relational networks offer. If 
only a particular single -relational slice of the multi-relational network is desired (e.g. a 
citation network, lanl: cites), then this single-relational component can be isolated 
and subjected the previously presented single-relational network algorithms. However, if 
a multi-relational network is to be generally useful, then a method that takes advantage of 
the various types of edges in the network is desired. The methods presented next define 
abstract/implicit paths through a network. By doing so, a multi-relational network can be 
redefined as a "semantically rich" single-relational network. For example, in Figure[5j there 
does not exist lanl : authorCites edges (i.e. if person i wrote an article that cites the 
article of person j, then it is true that i lanl : authorCites j). However, this edge can 



'Note that for metadata property distributions that are not normally distributed, a non-parametric correlation 
such as the Spearman p or Kendall r may be the more useful correlation coefficient. 
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be automatically generated by making use of the lanl : authored and lanl : cites 
edges. In this way, a breadth-first search or a random walk can use these automatically 
generated, semantically rich edges. By using generated edges, it is possible to treat a multi- 
type subset of the multi-relational network as a single-relational network. 

3.4.1 A Multi-Relational Path Algebra 

A path algebra is presented to map a multi-relational network to a single-relational network 
in order to expose the multi-relational network to single-relational network algorithms. The 
multi-relational path algebra summarized is discussed at length in [50]. In short, the path 
algebra manipulates a multi-relational tensor, A G {0,l}l^l^l^l^l'^l,in order to derive a 
semantically-rich, weighted single-relational adjacency matrix, A G rI^I^I^L Uses of the 
algebra can be generally defined as 



where A is the path operation defined. 

There are two primary operations used in the path algebra: traverse and filterp*] The 
traverse operation is denoted • : rI^I ^ 1^1 x rI^I ^ 1^1 and uses standard matrix multiplication 
as its function rule. Traverse is used to "walk" the multi-relational network. The idea behind 
traverse is first described using a single-relational network example. If a single-relational 
adjacency matrix is raised to the second power (i.e. multiplied with itself) then the resultant 

(2) 

matrix denotes how many paths of length 2 exist between vertices [17]. That is, A.^- 
(i.e. (A • A) j j) denotes how many paths of length 2 go from vertex i to vertex j. In general, 
for any power p, 



This property can be applied to a multi-relational tensor. If and A"^ are multiplied to- 
gether then the result adjacency matrix denotes the number of paths of type 1^2 that exist 
between vertices. For example, if A^ is the coauthorship adjacency matrix, then the adja- 
cency matrix Z = A^ • A^^ denotes how many coauthorship paths exist between vertices, 
where T transposes the matrix (i.e. inverts the edge directionality). In other words if Marko 
(vertex i) and Johan (vertex j) have written 19 papers together, then Zj j = 19. However, 
given that the identity element Zj j may be greater than (i.e. a person has coauthored with 
themselves), it is important to remove all such reflexive coauthoring paths back to the orig- 
inal author. In order to do this, the filter operation is used. Given the identify matrix I and 
the all 1 matrix 1, 



yields a true coauthorship adjacency matrix, where o : rI^I ^ 1^1 x rI^I ^1^1 is the entry-wise 
Hadamard matrix multiplication operation [31]. Hadamard matrix multiplication is defined 

"'other operations not discussed in tliis section are merge and weight. For a in depth presentation of the 
multi-relational path algebra, see [50]. 



A : {cijl^l^l^l^l"^! ^ rI^IxI^I 
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Ai,i-Bi,i 



AoB 



An 1 • Bj2 1 • • • An m ' Bjj i 



In this example, the Hadamard entry-wise multiplication operation applies an "identify fil- 
ter" to (^A^ ■ A^^^ that removes all paths back to the source vertices (i.e. back to the 
identity vertices) as it sets Zj j = 0. This example demonstrates that a multi-relational 
network can be mapped to a semantically-rich, single -relational network. In the original 
multi-relational network, there exists no coauthoring relationship. However, this relation 
exists implicitly by means of traversing and filtering particular pathsp] 

The benefit of the summarized path algebra is that is can express various abstract paths 
through a multi-relational tensor in an algebraic form. Thus, given the theorems of the alge- 
bra, it is possible to simplify expressions in order to derive more computationally efficient 
paths for deriving the same information. The primary drawback of the algebra is that it is 
a matrix algebra that globally operates on adjacency matrix slices of the multi-relational 
tensor A. Given that size of the Giant Global Graph, it is not practical to execute global 
matrix operations. However, these path expressions can be used as an abstract path that a 
discrete "walker" can take when traversing local areas of the graph. This idea is presented 
next. 



3.4.2 Multi-Relational Grammar Walkers 

Previously, both the stationary probability distribution, PageRank, and spreading activation 
were defined as matrix operations. However, it is possible to represent these algorithms 
using discrete random walkers. In fact, in many cases, this is the more natural representation 
both in terms of intelligibility and scalability. For many, it is more intuitive to think of these 
algorithms as being executed by a discrete random walker moving from vertex to vertex 
recording the number of times it has traversed each vertex. In terms of scalability, all of 
these algorithms can be approximated by using less walkers and thus, less computational 
resources. Moreover, when represented as a swarm of discrete walkers, the algorithm is 
inherently distributed as a walker is only aware of its current vertex and those vertices 
adjacent to it. 

For multi-relational networks, this same principle applies. However, instead of ran- 
domly choosing an adjacent vertex to traverse to, the walker chooses a vertex that is de- 
pendent upon an abstract path description defined for the walker. Walkers of this form are 
called grammar-based random walkers [48]. A path for a walker can be defined using any 
language such as the path algebra presented previous or SPARQL [46] . The following ex- 
amples are provided in SPARQL as it is the defacto query language for the Web of Data. 
Given the coauthorship path description 

{a^-A^^)o(1-1), 

"while not explored in [50], it is possible to use the path algebra to create inference rules in a manner 
analogous to the Semantic Web Rule Language (SWRL) [32]. Moreover, as explored in [50], it is possible to 
perform any arbitrary SPARQL query [46] using the path algebra (save for greater-than/less-than comparisons 
of and regular expressions on literals). 
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it is possible to denote this as a local walker computation in SPARQL as 

SELECT ?dest WHERE { 
@ lanl : authored ?x . 
?dest lanl : authored ?x . 
FILTER (@ != ?dest) 

} 

where the symbol @ denotes the current location of the walker (i.e. a parameter to the query) 
and ?dest is a collection of potential locations for the walker to move to (i.e. the return 
set of the query). It is important to note that the path algebra expression performs a global 
computation while the SPARQL query representation distributes the computation to the 
individual vertices (and thus, individual walkers). Given the set of resources that bind to 
?dest, the walker selects a single resource from that set and traverses to it. At which 
point, @ is updated to that selected resource value. This process continues indefinitely and, 
in the long run behavior, the walker's location probability over V denotes the stationary 
distribution of the walker in the Giant Global Graph according to the abstract coauthorship 
path description. The SPARQL query redefines what is meant by an adjacent vertex by 
allowing longer paths to be represented as single edges. Again, this is why it is stated that 
such mechanisms yield semantically rich, single-relational networks. 

In the previous coauthorship example, the grammar walker, at every vertex it encoun- 
ters, executes the same SPARQL query to locate "adjacent" vertices. In more complex 
grammars, it is possible to chain together SPARQL queries into a graph of expressions such 
that the walker moves not only through the Giant Global Graph, but also through a web of 
SPARQL queries. Each SPARQL query defines a different abstract edge to be traversed. 
This idea is diagrammed in Figure[6} where the grammar walker "walks" both the grammar 
and the Giant Global Graph. 



SELECT ?dest3 



SELECT ?clest1 



SELECT ?dest2 



A Grammar 



grammar walker 

--r\-. 



7T 




Figure 6: A grammar walker maintains its state in the Giant Global Graph (its current vertex 
location) and its state in the grammar (its current grammar location — SPARQL query). 
After executing its cunent SPARQL query, the walker moves to a new vertex in the Giant 
Global Graph as well as to a new grammar location in the grammar. 



To demonstrate a multiple SPARQL query grammar, a PageRank coauthorship grammar 
is defined using two queries. The first query was defined above and the second query is 
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SELECT ?dest WHERE { 

?dest rdf:type lanl:Person 

} 

This rule serves as the "teleportation" function utilized in PageRank to ensure a strongly 
connected network. Thus, if there is a a probability that the first query will be executed and 
a (1 — a) probability that the second rule will be executed, then coauthorship PageRank in 
the Giant Global Graph is computed. Of course, the second rule can be computationally 
expensive, but it serves to elucidate the idea{^ It is noted that the stationary probability 
distribution and the PageRank of the Giant Global Graph can be very expensive to compute 
if the grammar does not reduce the traverse space to some small subset of the full Giant 
Global Graph. In many cases, grammar walkers are more useful for calculating semantically 
meaningful spreading activations. In this form, the Giant Global Graph can be searched 
efficiently from a set of seed resources and a set of walkers that do not iterate indefinitely, 
but instead, for some finite number of steps. 



The geodesic algorithms previously defined in ^3.3 can be executed in an analogous 
fashion using grammar-based geodesic walkers [51]. The difference between a geodesic 
walker and a random walker is that the geodesic walker creates a "clone" walker each time 
it is adjacent to multiple vertices. This is contrasted to the random walker, where the ran- 
dom walker randomly chooses a single adjacent vertex. This cloning process implements 
a breadth-first search. It is noted that geodesic algorithms have high algorithmic com- 
plexity and thus, unless the grammar can be defined such that only a small subset of the 
Giant Global Graph is traversed, then such algorithms should be avoided. In general, the 
computational requirements of the algorithms in single -relational networks also apply to 
multi-relational networks. However, in multi-relational networks, given that adjacency is 
determined through queries, multi-relational versions of these algorithms are more costly. 
Given that the Giant Global Graph will soon grow to become the largest network instantia- 
tion in existence, being aware of such computational requirements is a necessary. 

Finally, a major concern with the Web of Data as it is right now is that data is pulled 
to a machine for processing. That is, by resolving an http-based URI, an RDF subgraph 
is returned to the retrieving machine. This is the method advocated by the Linked Data 
community [9]. Thus, walking the Giant Global Graph requires pulling large amounts of 
data over the wire. For large network traversals, instead of moving the data to the process, 
it may be better to move the process to the data. By discretizing the process (e.g. using 
walkers) it is possible to migrate walkers between the various servers that support the Giant 
Global Graph. These ideas are being further developed in future work. 



4 A Distributed Object Repository 

The Web of Data can be interpreted as a distributed object repository — a Web of Process. 
An object, from the perspective of object-oriented programming, is defined as a discrete 
entity that maintains 

'^Note that this description is not completely accurate as "rank sinks" in the first query (when ?dest = 0) 
will halt the process. Thus, in such cases, when the process halts, the second query should be executed. At 
which point, rank sinks are alleviated and PageRank is calculated. 
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• fields: properties associated with the object. These may be pointers to literal primi- 
tives such as characters, integers, etc. or pointers to other objects. 

• methods: behaviors associated with the object. These are the instructions that an 
object executes in order to change its state and the state of the objects it references. 

Objects are abstractly defined in source code. Source code is written in a human read- 
able/writeable language. An example Person class defined in the Java language is pre- 
sented below. This particular class has two fields (i.e. age and friends) and one method 
(i.e. makeFriend). 

public class Person { 
int age; 

Collection<Person> friends; 

public void makeFriend (Person p) { 
this . friends . add (p) ; 

} 

} 

There is an important distinction between a class and an object. A class is an abstract de- 
scription of an object. Classes are written in source code. Object's are created during the 
run-time of the executed code and embody the properties of their abstract class. In this way, 
objects instantiate (or realize) classes. Before objects can be created, a class described in 
source code must be compiled so that the machine can more efficiently process the code. 
In other words, the underlying machine has a very specific instruction set (or language) 
that it uses. It is the role of the compiler to translate source code into machine-readable in- 
structions. Instructions can be represented in the native language of the hardware processor 
(i.e. according to its instruction set) or it can be represented in an intermediate language 
that can be processed by a virtual machine (i.e. software that simulates the behavior of a 
hardware machine). If a virtual machine language is used, it is ultimately the role of the 
virtual machine to translate the instructions it is processing into the instruction set used by 
the underlying hardware machine. However, the computing stack does not end there. It is 
ultimately up to the "laws of physics" to alter the state of the hardware machine. As the 
hardware machine changes states, its alters the state of all the layers of abstractions built 
atop it. 

Object-oriented programming is perhaps the most widely used software development 
paradigm and is part of the general knowledge of most computer scientists and engineers. 
Examples of the more popular object-oriented languages include C++, Java, and Python. 
Some of the benefits of object-oriented programming are itemized below. 

• abstraction: representing a problem intuitively as a set of interacting objects. 

• encapsulation: methods and fields are "bundled" with particular objects. 

• inheritance: subclasses inherit the fields and methods of their parent classes. 

In general, as systems scale, the management of large bodies of code is made easier through 
the use of object-oriented programming. 
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There exist many similarities between the RDFS and OWL Semantic Web ontology 
languages discussed in ^ and the typical object-oriented programming languages previ- 
ously mentioned. For example, in the ontology languages, there exist the notion of classes, 
their instances (i.e. objects), and instance properties (i.e. fields){^ However, the biggest 
differentiator is that objects in object-oriented environments maintain methods. The only 
computations that occur in RDFS and OWL are through the inference rules of the logic they 
implement and as such are not specific to particular classes. Even if rules are implemented 
for particular classes (for example, in SWRL [32]), such rule languages are not typically 
Turing-complete [54] and thus, do not support general-purpose computing. 

In order to bring general-purpose, object-oriented computing to the Web of Data, var- 
ious object-oriented languages have been developed that represent their classes and their 
objects in RDF. Much like rule languages such as SWRL have an RDF encoding, these 
object-oriented languages do as well. However, they are general-purpose imperative lan- 
guages that can be used to perform any type of computation. Moreover, they are object- 
oriented so that they have the benefits associated with object-oriented systems itemized 
previously. When human readable-writeable source code written in an RDF programming 
language is compiled, it is compiled into RDF. By explicitly encoding methods in RDF — 
their instruction-level data — the Web of Data is transformed into a Web of Process{l3 The 
remainder of this section will discuss three computing models on the Web of Process: 

1. partial object repository: where typical object-oriented languages utilize the Web of 
Process to store object field data, not class descriptions and methods. 

2. full object repository: where RDF-based object-oriented languages encode classes, 
object fields, and object methods in RDF. 

3. virtual machine repository: where RDF-based classes, objects, and virtual machines 
are represented in the Web of Process. 



4.1 Partial Object Repository 

The Web of Process can be used as a partial object repository. In this sense, objects repre- 
sented in the Web of Process only maintain their fields, not their methods. It is the purpose 
of some application represented external to the Web of Process to store and retrieve object 
data from the Web of Process. In many ways, this model is analogous to a "black board" 
tuple-space [24] By converting the data that is encoded in the Web of Process to an object 
instance, the Web of Process serves as a database for populating the objects of an applica- 
tion. It is the role of this application to provide a mapping from the RDF encoded object to 
its object representation in the application (and vice versa for storage). A simple mapping 

''it is noted that the semantics of inheritance and properties in object-oriented languages are different than 
those of RDFS and OWL. Object-oriented languages are frame-based and tend to assume a closed world [56]. 
Also, there does not exist the notion of sub-properties in object-oriented languages as fields are not "first-class 
citizens." 

'''it is noted that the Web of Process is not specifically tied to object-oriented languages. For example, the 
Ripple programming language is a relational language where computing instructions are stored inrdf:Lists 
[53]. Ripple is generally useful for performing complex query and insert operations on the Web of Process. 
Moreover, because programs are denoted by URIs, it is easy to link programs together by referencing URIs. 

'^Object-spaces such as JavaSpaces is a modem object-oriented use of a tuple-space [22]. 
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is that a URI can denote a pointer to a particular object. The predicates of the statements 
that have the URI as a subject are seen as the field names. The objects of those statements 
are the values of those fields. For example, given the Person class previously defined, an 
instance in RDF can be represented as 

(lanl:1234, rdf:type, lanl:Person) 

{lanl:1234, lanl:age, "29" " 'xsd: int) 

{lanl:1234, lanl : friend, lanl:2345) 

{lanl:1234, lanl : friend, lanl:3456) 

{lanl:1234, lanl : friend, lanl:4567), 



where lanl:1234 represents the Person object and the lanl:friend properties 
points to three different Person instances. This simple mapping can be useful for many 
types of applications. However, it is important to note that there exists a mismatch between 
the semantics of RDF, RDFS, and OWL and typical object-oriented languages. In order 
to align both languages it is possible either to 1.) ignore RDF/RDFS/OWL semantics and 
interpret RDF as simply a data model for representing an object or 2.) make use of com- 
plicated mechanisms to ensure that the external object-oriented environment is faithful to 
such semantics [33]. 

Various RDF-to-object mappers exists. Examples include Schemagerj^ Elmcp^ and 
ActiveRDF [42]. RDF-to-object mappers usually provide support to 1.) automatically gen- 
erate class definitions in the non-RDF language, 2.) automatically populate these objects 
using RDF data, and 3.) automatically write these objects to the Web of Process. With RDF- 
to-object mapping, what is preserved in the Web of Process is the description of the data 
contained in an object (i.e. its fields), not an explicit representation of the object's process 
information (i.e. its methods). However, there exists RDF object-oriented programming 
languages that represent methods and their underlying instructions in RDF. 



4.2 Full Object Repository 

The following object-oriented languages compile human readable/writeable source code 
into RDF: Adenine [47], Adenosine, FABL [25], and Neno [49]. The compilation process 
creates a full RDF representation of the classes defined. The instantiated objects of these 
classes are also represented in RDF. Thus, the object fields and their methods are stored in 
the Web of Process. Each aforementioned RDF programming language has an accompa- 
nying virtual machine. It is the role of the respective virtual machine to query the Web of 
Process for objects, execute their methods, and store any changes to the objects back into 
the Web of Process. 

Given that these languages are designed specifically for an RDF environment and in 
many cases, make use of the semantics defined for RDFS and OWL, the object-oriented 
nature of these languages tend to be different than typical languages such as C-i~i- and Java. 
Multiple inheritance, properties as classes, methods as classes, unique SPARQL-based lan- 
guage constructs, etc. can be found in these languages. To demonstrate methods as classes 

'*Schemagen is currently available at |http : / / jen a . sour cef orge ■ net/[ 
'^Elmo is currently available at http : / /www. openrdf . org/ 
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and unique SPARQL-based language constructs, two examples are provided from Adeno- 
sine and Neno, respectively. In Adenosine, methods are declared irrespective of a class and 
can be assigned to classes as needed. 

{ lanl : makeFriend, rdf:type, std:Method) 

{ lanl : makeFriend, std:onClass, lanl:Person) 

{ lanl : makeFriend, std:onClass, lanl:Dog). 

Next, in Neno, it is possible to make use of the inverse query capabihties of SPARQL. The 
Neno statement 

rpi:josh.lanl:friend.lanl:age; 

is typical in many object-oriented languages: the age of the friends of Josh{^ However, the 
statement 

rpi:josh. .lanl:friend.lanl: age; 

is not. This statement makes use of "dot dot" notation and is called inverse field referencing. 
This particular example returns the age of all the people that are friends with Josh. That is, it 
determines all the lanl : Person objects that are a lanl : friend of lanl : josh and 
then returns the xsd : int of their lanl : age. This expression resolves to the SPARQL 
query 

SELECT ?y WHERE { 

?x <lanl : f riend> <rpi:josh> . 
?x <lanl : age> ?y } . 

In RDF programming languages, there does not exist the impedance mismatch that 
occurs when integrating typical object-oriented languages with the Web of Process. More- 
over, such languages can leverage many of the standards and technologies associated with 
the Web of Data in general. In typical object-oriented languages, the local memory serves 
as the object storage environment. In RDF object-oriented languages, the Web of Process 
serves this purpose. An interesting consequence of this model is that because compiled 
classes and instantiated objects are stored in the Web of Process, RDF software can easily 
reference other RDF software in the Web of Process. Instead of pointers being 32- or 64-bit 
addresses in local memory, pointers aie URIs. In this medium, the Web of Process is a 
shared memory structure by which all the world's software and data can be represented, 
interhnked, and executed. 

"The formalization of computation within RDF allows active content to be 
integrated seamlessly into RDF repositories, and provides a programming en- 
vironment which simplifies the manipulation of RDF when compared to use of 
a conventional language via an API." [25] 

Actually, this is not that typical as fields cannot denote multiple objects in most object-oriented langauges. 
In order to reference multiple objects, fields tend to reference an abstract "collection" object that contains 
multiple objects within it (e.g. an array). 
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A collection of the previously mentioned benefits of RDF programming are itemized below. 

• the language and RDF are strongly aligned: there is a more direct mapping of the 
language constructs and the underlying RDF representation. 

• compile time type checking: RDF APIs will not guarantee the validity of an RDF 
object at compile time. 

• unique language constructs: Web of Data technology and standards are more easily 
adopted into RDF programming languages. 

• reflection: language reflection is made easier because everything is represented in 
RDF. 

• reuse: software can reference other software by means of URIs. 

There are many issues with this model that aie not discussed here. For example, issues 
surrounding security, data integrity, and computational resource consumption make them- 
selves immediately apparent. Many of these issues are discussed, to varying degrees of 
detail, in the publications describing these languages. 

4.3 Virtual Machine Repository 

In the virtual machine repository model, the Web of Process is made to behave like a 
general-purpose computer. In this model, software, data, and virtual machines are all en- 
coded in the Web of Process. The Fhat RDF virtual machine (RVM) is a virtual machine 
that is represented in RDF [49]. The Fhat RVM has an architecture that is similar to other 
high-level virtual machines such as the Java virtual machine (JVM). For example, it main- 
tains a program counter (e.g. a pointer to the current instruction being executed), various 
stacks (e.g. operand stack, return stack, etc.), variable frames (e.g. memory for declared 
variables), etc. However, while the Fhat RVM is represented in the Web of Process, it does 
not have the ability to alter its state without the support of some external process. An exter- 
nal process that has a reference to a Fhat RVM can alter it by moving its program location 
through a collection of instructions, by updating its stacks, by altering the objects in its 
heap, etc. Again, the Web of Process (and more generally, the Web of Data) is simply a 
data structure. While it can represent process information, it is up to machines external to 
the Web of Process to manipulate it and thus, alter its state. 

In this computing model, a full computational stack is represented in the Web of Pro- 
cess. Computing, at this level, is agnostic to the physical machines that support its repre- 
sentation. The lowest-levels of access are URIs and their RDF relations. There is no pointer 
to physical memory, disks, network cards, video cards, etc. Such RDF software and RVMs 
exist completely in an abstract URI and RDF address space — in the Web of Process. In 
this way, if an external process that is executing an RVM stops, the RVM simply "freezes" 
at its current instruction location. The state of the RVM halts. Any other process with a 
reference to that RVM can continue to execute it{^ Similarly, an RVM represented on one 
physical machine can compute an object represented on another physical machine. How- 
ever, for the sake of efficiency, given that RDF subgraphs can be easily downloaded by a 

"in analogy, if the laws of physics stopped "executing" the world, the state of the world would "freeze" 
awaiting the process to continue. 
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physical machine, the RVMs can be migrated between data stores — the process is moved 
to the data, not the data to the process. Many issues surrounding security, data integrity, 
and computational resource consumption are discussed in [49]. Currently there exists the 
concept, the consequences, and a prototype of an RVM. Future work in this area will hope 
to transform the Web of Process (and more generally, the Web of Data) into massive-scale, 
distributed, general-purpose computer. 

5 Conclusion 

A URI can denote anything. It can denote a term, a vertex, an instruction. However, by 
itself, a single URI is not descriptive. When a URI is interpreted within the context of other 
URIs and literals, it takes on a richer meaning and is more generally useful. RDF is the 
means of creating this context. Both the URI and RDF form the foundational standards 
of the Web of Data. From the perspective of the domain of knowledge representation and 
reasoning, the Web of Data is a distributed knowledge base — a Semantic Web. In this inter- 
pretation, according to which ever logic is used, existing knowledge can be used to infer new 
knowledge. From the perspective of the domain of network analysis, the Web of Data is a 
distributed multi-relational network — a Giant Global Graph. In this interpretation, network 
algorithms provide structural statistics and can support network-based information retrieval 
systems. From the perspective of the domain of object-oriented programming, the Web of 
Data is a distribute object repository — a Web of Process. In this interpretation, a complete 
computing environment exists that yields a general-purpose. Web-based, distributed com- 
puter. For other domains, other interpretations of the Web of Data can exist. Ultimately, 
the Web of Data can serve as a general-purpose medium for storing and relating all the 
world's data. As such, machines can usher in a new era of global-scale data management 
and processing. 
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